DEPTH: A Novel Algorithm for Feature Ranking with Application to Genome-Wide Association Studies
نویسندگان
چکیده
Analysis of genome-wide association studies (GWAS) involves selecting variables from a large set of variables, an issue common to science. The sample could contain people affected by a disease (cases) and people that are not affected by the disease (controls), though other designs are possible and the approach below applies to non-human samples. The ultimate aim of a GWAS is to identify susceptibility genes, and the dominant paradigm to date has been to do this by identifying single nucleotide polymorphism (SNPs) associated with risk of disease. We propose a new algorithm, called DEPendence of association on the number of Top Hits (DEPTH), for variable selection based on permutation statistics and stability selection. DEPTH is: (i) applicable to any parametric regression model, (ii) designed to be run in a parallel computing environment, and (iii) exploits information from the correlation structure of the predictors. In the context of a GWAS, the algorithm can be used for all SNPs in the genome, or for any subset of SNPs (e.g. in a region, in a pathway, functional, etc.). We have found that the algorithm shows good performance when compared to several established procedures using simulated data. We applied DEPTH to a breast cancer GWAS of only 204 cases (about 50:50 ER-ve and ER+ve) and 287 controls. DEPTH found evidence that variants in a particular gene are more associated with ER-ve breast cancer, and more so than ER+ve disease. This finding has been replicated using a large international breast cancer case-control GWAS data set (55,540 cases, 51,168 controls), and would have been unlikely to be identified by conventional analyses. DEPTH is currently running on an IBM BlueGene/Q supercomputer, though versions can also be run on a lap top for targeted analyses. A summary of findings and insights into aetiology and method performance will be presented.
منابع مشابه
Genome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملGenome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کاملPhenotype Prediction and Feature Selection in Genome - Wide Association Studies
PHENOTYPE PREDICTION AND FEATURE SELECTION IN GENOME-WIDE ASSOCIATION STUDIES by Andrew Roberts Genome wide association studies (GWAS) search for correlations between single nucleotide polymorphisms (SNPs) in a subject genome and an observed phenotype. GWAS can be used to generate models for predicting phenotype based on genotype, as well as aiding in identification of specific genes affecting ...
متن کاملNeuro-Fuzzy Based Algorithm for Online Dynamic Voltage Stability Status Prediction Using Wide-Area Phasor Measurements
In this paper, a novel neuro-fuzzy based method combined with a feature selection technique is proposed for online dynamic voltage stability status prediction of power system. This technique uses synchronized phasors measured by phasor measurement units (PMUs) in a wide-area measurement system. In order to minimize the number of neuro-fuzzy inputs, training time and complication of neuro-fuzzy ...
متن کاملCamera Pose Estimation in Unknown Environments using a Sequence of Wide-Baseline Monocular Images
In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4...
متن کامل